High Particle Number Emissions Determined with Robust Regression Plume Analysis (RRPA) from Hundreds of Vehicle Chases

Particle number emission factors were determined for hundreds of individual diesel and gasoline vehicles in their real operation on Finnish highways and regional roads in 2020 with one-by-one chase measurements and Robust Regression Plume Analysis (RRPA). RRPA is a rapid way to analyze data from a large number of vehicle chases automatically. The particle number emission factors were determined for four ranges of particle diameters (>1.3, > 2.5, > 10, and >23 nm). The emission factors for most of the measured vehicles were observed to significantly exceed the non-volatile particle number limits used in the most recent European emission regulation levels, for both light-duty and heavy-duty vehicles. Additionally, most of the newest vehicles (covering regulation levels up to Euro 6), for which the particle number emission regulations (non-volatile >23 nm particles) apply, showed emission factors of the >23 nm particles clearly above the regulation limits. Although the experiments included measurements of real-world plume particles (mixture of non-volatile and semi-volatile particles) and not only the non-volatile regulated particles, it is important to note that the emissions of regulated particles were also estimated to exceed the limits, based on non-volatile >23 nm particle fraction from curbside studies. Moreover, the emission factors of the >1.3 nm particles were mostly about an order of magnitude higher compared to the >23 nm particles.


S1.1 Particle Size-Independent Corrections
All measurement devices of which data are used in this study have time responses of ∼1 s and they also record the data in 1 s time resolution. Due to different sample residence times in the sampling lines for different devices and due to possible differences between the clocks determining the time stamps for the recorded data, the measurement data were first synchronized by using peaks in time series data from flame tests performed several times during the measurements. The flame tests were conducted by bringing liquefied petroleum gas flames near the tip of the inlet of the measurement van for short periods. Due to a small uncertainty in obtaining an accurate time synchronization between the devices, the measurement data were also averaged into 3 s time resolution before further analysis.
All measured particle concentrations were multiplied with the dilution ratio (DR) of the used bifurcated flow diluter set. This type of diluter directs the major part of the flow directly through a HEPA filter and only a small part past the filter. The proportion of the total flow rate to the flow rate bypassing the filter basically determines the DR of the diluter. The total DR of the diluter set consisting of two similar diluters in series for >50 nm particles was measured to be 98±4 by having a source of a test dioctyl sebacate aerosol with a known concentration upstream of the diluter set and by measuring the concentration downstream. The measured particle concentrations were also corrected for the maximum detection efficiencies of the CPCs. These were also obtained from the dioctyl sebacate aerosol measurement, where the particle size distribution was measured and was with enough large particles. The obtained maximum detection efficiencies are 72.84% for the PSM+CPC combination (> 1.3 nm), 100.4% for the CPC > 2.5 nm, 82.89% for the CPC > 10 nm, and 99.91% for the CPC > 23 nm. The detection efficiencies of the CPCs decrease with decreasing particle size close to their cut-off diameters, but they are handled here by assuming step functions where the detection efficiency is zero for particles smaller than the cut-off diameter S-2 and the previously mentioned maximums for particles larger than the cut-off diameter.

S1.2 Particle Size-Dependent Corrections
Traditionally, experiments including ultrafine particles have required corrections to the measured particle concentrations due to diffusional losses of particles onto the inner walls of the sampling lines. These corrections involve particle size-dependent correction factors which the measured concentrations need to be multiplied with. Here we calculated these correction factors using the functions by Gormley and Kennedy S1 (for straight tubes with laminar flows) and Brockmann S2 (for straight tubes with turbulent flows). The sampling system for the CPCs consisted of 4.5 m of tubes in total. The first part was a fixed 2.5 m line installed in ATMo-Lab. The flow rate through this main line with the inner diameter of 9 mm was 30 slpm, denoting turbulent flow. The next part of the sampling system was 1 m line with the inner diameter of 6 mm. This line was used for the CPCs; the flow rate being 6.6 slpm, denoting laminar flow. The last part consisted of separate parallel 1 m lines with the inner diameters of 4 mm for each CPCs. All the last lines had laminar flow, the flow rates depending on the CPC (2.5 slpm for the PSM+CPC combination (> 1.3 nm), 1.5 slpm for the CPC > 2.5 nm, and 1 slpm for the rest). There were four sharp bends in these lines in total; two in the main line, one in the next line, and one in each of the last parallel lines. Because the front inlet was pointing downwards to prevent excessive dust passing the measurement system, the sampling was not isokinetic, but it could have not been achieved anyway because the driving speed was not a constant. By taking also aspiration, gravitational losses, and inertial losses in the bends into account, aspiration becomes notable with particles larger than 100 nm and inertial losses in the bends with particles larger than 1000 nm. For example, the aspiration efficiency of 100 nm particles, according to Hangal and Willeke S3 , is 99% with the driving speed of 60 km/h and 94% with the driving speed of 120 km/h. Nevertheless, because the major fraction of particle number in vehicles emissions is typically in particle sizes much smaller than 100 nm and this study focuses on particle S-3 number only, these sampling system effects on larger particles were not corrected.
The obtained correction factors (omitting the effects on large particles) for the four particle size ranges of this study can be found in Table S1, from which it can be seen that the correction is significant especially for the smallest particle sizes. In addition to particle sizes, these corrections also depend, e.g., on the lengths and inner diameters of the tubes and the flow rates, of which determination involves some measurement uncertainties, but the largest uncertainty arises from the selection of the representative particle size within a specific particle size range. The typically selected geometric mean diameter of a size range may not truly represent the particle size with which the calculations should be done due to non-linearity in the correction functions. The representative particle size of a size range depends also on the shape of the particle size distribution within the size range, which cannot be easily determined. We calculated also the ranges of variation for the correction factors by assuming that the variation is caused only by the selection of the representative particle size. Table S1: Correction factors for different particle size ranges. The first values in the cells represent the factors which are applied to the measured data and they are calculated using the geometric mean diameter of a size range, i.e., the 50th percentile of a size range in log-scale (for example, 1.8 nm for the size range of 1.3-2.5 nm). The values in parentheses represent ranges of variation for the factors, calculated by assuming that the representative particle size within a size range can vary between the 25th and the 75th percentile of a size range in log-scale (for example, 1.53 and 2.12 nm for the size range of 1.3-2.5 nm). For the size range of >23 nm particles, the factors are calculated by selecting 50 nm for the mean diameter and 23 nm and 100 nm for calculating the ranges of variation.

S-4
Novel methods to calculate the corrections were also applied in this study. These include correcting elevated diffusional losses of particles in bent parts of the sampling lines and correcting the DR of the bifurcated flow diluters for very small particles (due to increased diffusional losses). Bends in tubes also increase losses of larger particles due to particle impaction on the inner walls but this effect was omitted in this study, since it is negligible for the ultrafine particle size range of interest in this study. The correction factors for the diffusional losses in the bends (four sharp bends) in the sampling lines were calculated using the functions by Olin and Dal Maso S4 . It can be seen from Table S1 that this effect is notable only for the lowest particle size range (1.3-2.5 nm).
The DR of the used bifurcated flow diluter set was determined only for particles larger than 50 nm, but, in reality, DR is higher for the smaller particles due to increased diffusional losses in diluters. This elevated DR can be corrected by multiplying the results with another diffusional losses-based correction factor. Here we used a method used by Collins S5 , in which the diffusional losses inside a diluter are experimentally determined for different particle sizes and the obtained penetration vs. particle diameter curve is used to find L/Q, the ratio of the bypassing tube length (L) and the bypassing flow rate (Q), producing this curve with the function by Gormley and Kennedy S1 . The diluter set used in this study was not tested for smaller particles, but here we use the losses data determined with silver particles for another similar diluter. The smallest particles used in the test were about 3 nm in diameter, because handling even smaller particles accurately becomes very difficult. The fitted L/Q value obtained from the test (with one diluter with the total flow rate of 2.5 slpm and DR of 12, denoting Q of 0.21 slpm) is 0.46 m slpm −1 , resulting in L = 96 mm, which is much longer than the physical length of the bypassing tube in the diluter. This suggests that the function by Gormley and Kennedy S1 do not suit very well for this kind of diffusional losses calculation (because it is derived only for straight tubes with some additional assumptions), but is, however, the best option currently available. On the other hand, the curve of the function by Gormley and Kennedy S1 fits surprisingly well onto the experimental data in the S-5 study by Collins S5 , in which the lowest particle diameter was 9 nm. However, predicting the losses for the smaller particles requires extrapolation, of which correctness cannot be easily verified. Additionally, the obtained L value of a diluter may depend also on the flow rate (during the chase experiment, it was 6.6 slpm, instead of 2.5 slpm during the diffusional losses test). Table S1 showing the obtained correction factors implies that the effect of the diffusional losses of particles in the diluter set is notable for the lowest particle size range but the correction should not be neglected even for the next size range (2.5-10 nm).
It can be seen from the total correction factors in Table S1 that the lowest particle size range required remarkable corrections, the total correction factor being 16.39. The uncertainty of this correction factor is also high, −43%/+104%, meaning that the concentrations of 1.3-2.5 nm particles reported in this study may be over-or under-estimated with the factor of about 2 (the reported concentrations can be decreased 43% or increased 104%, in maximum). The uncertainty is still relatively very high also for the next particle size range, 2.5-10 nm, (−25%/+58%) but the correction factor itself is already much lower (2.163).
It should be noted that because these novel correction methods (diffusional losses due to bends in tubes and diluters) have not been applied in most of the studies in literature, the concentrations reported in this study may not be directly comparable with previous studies reporting only the lower limits of the concentrations. The subtotal correction factors in Table S1, denoting the effect of the novel correction methods, can be used to make the results of this study comparable. The subtotal correction factor for, e.g., 1.3-2.5 nm particles is 2.108, denoting that the concentrations of 1.3-2.5 nm particles reported here should be divided by 2.108 in order to compare them with studies where these novel corrections have not been applied but the measurement systems yet contained (a similar number of) bends in tubes and (similar bifurcated flow) diluters. It should, however, be noted that these corrections are system-specific and may be totally neglected with some measurements systems (using only straight tubes and no diluters), but the magnitude of them (∼2.1 for the 1.3-2.5 nm particles, ∼1.2 for the 2.5-10 nm particles, and negligible for particles larger than ∼10 nm) S-6 should be kept in mind.

S2.1 Instantaneous Emission Factors
First, the method for deriving instantaneous emission factors (EFs) is presented. They hold only for a given time range, on the order of a few seconds, including time spent by aspiration of the background air by the engine, the combustion process, and transferring of the exhaust within the exhaust system and in the background air-after releasing from the tailpipe-to the location where the exhaust sample is measured.
Let a particle number concentration (in 1/cm 3 in NTP conditions) be N raw in the raw exhaust, N bg in the background air, and N m measured at any location. The measured concentration is thus where f denotes the volume fraction of the raw exhaust for a given measurement location (see, e.g., Herndon et al. S6 ). Eq S1 can also be expressed as By using the assumption that particles dilute within a turbulent exhaust plume with rates equal to CO 2 , S7 eq S2 leads to where [CO 2 ] raw , [CO 2 ] bg , and [CO 2 ] m denote the CO 2 concentrations (in ppm) in the raw exhaust, in the background air, and in the measured sample, respectively.

S-7
Emission factor (EF) in the units of number of emitted particles per 1 kg of emitted CO 2 is defined as S4) where E N is the emission rate of particles (in 1/s), E CO 2 is the emission rate of CO 2 (in kg/s), V is the volumetric flow rate of exhaust (in cm 3 /s in NTP conditions), 10 −6 ppm −1 comes from the conversion of ppm to unity, and ρ CO 2 is the density of CO 2 (= 1.83×10 −6 kg/cm 3 in NTP conditions). After cancellingV out and substituting the value for ρ CO 2 , eq S4 becomes Many studies (e.g., Herndon et al. S6 , Zavala et al. S8 ) utilize the approximations, to simplify eq S3 to which is the first part of the function for EF. Hence, EF can be obtained by dividing the excess particle number concentration (N m − N bg ) with the excess CO 2 concentration The first approximation (eq S6) is generally valid since [CO 2 ] raw is on the order of 100,000 ppm and [CO 2 ] bg on the order of 400 ppm. The latter approximation (eq S7) is also typically valid with many pollutants, but not always with particle emissions, as is the case here, since particle concentrations in raw exhaust can be close to the background concentrations or even close to zero, e.g., for vehicles equipped with particle filters.
Therefore, we utilize here a slightly different approach in deriving EF. Firstly, since the background air aspirated by the engine already contains CO 2 (with the concentration of [CO 2 ] bg ), its concentration in exhaust, [CO 2 ] raw , has not entirely originated from combustion, but in eq S5. Secondly, because the flow rates of exhaust and intake air are almost equal, the net contribution of a vehicle to the atmospheric particle number concentration is N raw − N bg instead of N raw . In other words, only the increase of the pollutant loading in a fixed air parcel aspirated by the engine and released from the tailpipe is consider emission. Now eq S5 becomes Eq S3 can be arranged to which is the first part of eq S9. Eventually, both of these two approaches lead to the same function for EF, Because N raw can be lower than N bg , due to combusting particles existing in the background air aspirated by the engine and due to filtering particles formed during the combustion process, EF can be negative as well. By assuming a vehicle with zero particles in its exhaust is driving in the background particle concentration of 10 6 cm −3 (e.g., driving in an exhaust plume of a highly-emitting vehicle), we obtain the value for the most negative EF possible, which is on the order of −10 13 /kg CO 2 .

S2.2 Averaged Emissions Factors
By simply averaging EFs from a longer time range, such as from a single chase event, does not output a real EF-average because the exhaust flow rate,V , can vary. Therefore, averaging by usingV values as weights would be needed (see Wihersaari et al. S9 ), but is typically S-9 unavailable in the case of chasing a large number of vehicles on public roads.
In many studies involving exhaust plume measurements (e.g., Canagaratna et al. S10 , Ježek et al. S11 ), instantaneous EFs are averaged over the time spent in the plume by dividing the time integral of the excess pollutant concentration by the time integral of the excess CO 2 concentration, as originally proposed by Hansen and Rosen S12 , The integration method (eq S12) explicitly needs the background values, N bg and [CO 2 ] bg , which are also assumed constants and can have substantial effects on the calculated EFs, especially when EFs are low or dilution is strong. Because the background values cannot practically be measured simultaneously with measuring the exhaust sample in chase experiments, they are typically determined before and/or after the chase events with no other nearby traffic. This is always not possible, and uncertainties arise when the background concentrations are observed to differ between the data before and after the chase event or when determining them is not successful due to other nearby traffic or due to drifting background S-10 measurement data. Differing and drifting data can be caused, e.g., when the surrounding area changes during the measurement from a forest area to a field area or vice versa. Using eq S12 and the assumption of constant background concentrations, EFs can be calculated with the equation where ⟨N m ⟩ and ⟨[CO 2 ] m ⟩ are the time-averages of N m (t) and [CO 2 ] m (t), respectively.
Another method to determine average-EFs from chase events without information onV is the slope method.
The assumption of a constant EF within a chase event is needed also in the slope method. In addition to a particle number concentration measured at a specific time moment, N m , let a particle number concentration measured at another time moment be N m ′ = N m + ∆N , and similarly for the CO 2 concentrations, Using the assumption of a constant EF, EFs are equal for the both time moments. Based on eq S11, this leads to an equation (also the background concentrations are assumed constants), which, by substituting N m ′ and [CO 2 ] m ′ , becomes By arranging the terms in eq S15, we obtain and further, S-11 By combining eqs S11 and S17, we finally obtain × 5.47 × 10 11 cm 3 ppm/kg CO 2 , which denotes that the EFs can be calculated with the slope in a N vs. [CO 2 ] plot even without explicit information on the background concentrations of particle number and CO 2 .
When the both assumptions (a constant EF and constant background concentrations) are adequate for a chase event, the data in the N vs. [CO 2 ] plot are, in theory, scattered so that the data points form a line. Otherwise (a varying EF and/or varying background concentrations), the slope of the linear fit on the data points represent one kind of an average EF for the chase event.
In this study, signals caused by other traffic in the vicinity of the chasing and the measured vehicle are neglected from the EF analysis using a robust regression method, in addition to the ordinary one. We selected the Matlab's default weight function, bisquare weighting, for robust regression. It uses an iteratively reweighted least squares algorithm, which sets the weights for all data points used in linear regression. It results in smaller weights for data points which are far from a fitted line, i.e., for outliers.

S2.3 Example Chase Events Analyzed with the Integration and
Slope Methods Figure S1 presents time series and scatter plots for number concentration of particles larger than 23 nm (N >23 ) and for [CO 2 ] measured in the vicinity of two example chase events.
Example 1 (Fig. S1a,b) represents a case where the exhaust plume has been captured well (a clear increase in [CO 2 ] during the chase) and where the measured vehicle has a high EF >23 (a clear and extensive increase in N >23 ). Example 2 (Fig. S1c,d) represents a case with a not very well captured plume and with lower EF >23 . It also demonstrates how the uncertainty in determining representative N bg affects the calculated EF in a case where the These NBG measurements were, however, not performed intentionally but are, instead, more like measurements between chase events; thus, they may include data from other traffic. In Determining EFs with the slope method (eq S18) requires only the slope, ∂N/∂[CO 2 ], which can be obtained from the linear fit (ordinary or robust regression) over the scatter plots (Fig. S1b,d). Standard errors of the slopes are considered their uncertainties. The data points in the scatter plots are averaged to 3 s for accounting possible uncertainties in the time synchronization between the devices.
The parameters and the results of the integration and the slope method for the example S-14 chase events are presented in Table S2. The uncertainties in the EFs are calculated with the error propagation law from the uncertainties in the parameters. It can be seen that, in example 1, the both methods output EFs of similar magnitudes and that the chosen background time range does not greatly affect the results from the integration method. In example 2, the output EFs are very different between the methods, and the chosen background time range affects the result greatly. The uncertainties in the EFs output by the integration method are also relatively very high, mainly due to high relative uncertainties in N bg . In conclusion, the slope method is superior to the integration method in example 2, where the integration method fails to output the EF with high certainty. The EFs output by the integration method in example 1 have higher certainty but they slightly depend on the chosen background measurement time range. Table S2: Comparison of the integration and slope methods for the example chase events presented in Fig. S1. DBG denotes the background concentrations are calculated from the time range of dedicated background measurements before/after the chase event (DBG1/2). DBG' denotes that the data from DBG2' are used instead of DGB2 in the example 2. NBG denotes the background concentrations are calculated from the time range of the nearest background measurements right before/after the chase event (NBG1/2). All particle number concentrations refer to particles larger than 23 nm.

S-15
Overall, as the slope method does not explicitly need information on the background concentrations, it does not suffer from EFs depending on the chosen background measurement time range. It is also a very fast method because there is no need to examine the background measurement time ranges and because the robust regression method presumably neglects signals caused by other nearby vehicles. The slope method also seems to work well even when the EF is so low that the increase in the particle number concentration caused by the studied vehicle is very low (example 2). This capability of the slope method arises from the theories behind the methods: unlike in the integration method where the EF is calculated from the excess pollutant and [CO 2 ] concentrations with respect to the background concentrations (eq S13), the calculation of the EF in the slope method is based on the excess pollutant and [CO 2 ] concentrations with respect to their concentrations at "another time moments" (eqs S14-S17). Whereas the background concentrations for the integration method can only be measured at a different time from the chase event, the data from "the another time moments" for the slope method are measured continuously during the chase event.
The linear fits from the slope method should, in theory, point the background concentration of particle number or CO 2 if the background concentration of the other one is known. gives the EF of 117 × 10 13 /kg CO 2 , which is very near the EF output by the slope method (114.3 × 10 13 /kg CO 2 ). Table S3 presents similar comparison for the example chase events presented in Fig. 2.
The integration method and the slope method with ordinary regression should give similar results. In many cases the uncertainties are so high that this expectation holds within the S-16 Table S3: Comparison of the integration and slope methods for the example chase events presented in Fig. 2. All values refer to particle number emission factors of particles larger than 23 nm in unit 10 13 /kg CO 2 . The integration method is conducted using the dedicated background measurements. The cases where the integration and the slope method with ordinary regression agree are marked as bold values. Fig. 2  EFs in the unit of 1/kg fuel can be further converted to the unit of particle number emitted per 1 km driving (1/km) for light-duty (LD) vehicles if the fuel consumption is known.
Here we use the median values of fuel consumptions from combined driving of all measured vehicles for which the values can be obtained from the national Traficom's database using their register numbers (6.1 l/100 km for LD diesel vehicles and 6.5 l/100 km for LD gasoline vehicles). The densities of the fuels are assumed to be 820 kg/m 3 for diesel and 748 kg/m 3 for gasoline, S13 representing average road-use fuels used in Finland. For heavy-duty (HD) vehicles, EFs in the unit of 1/kg fuel are converted to the unit of particle number emitted per 1 kWh of energy produced (1/kWh) by dividing the EFs by the heat value of diesel (12 kWh/kg fuel S14 ) and by the average of typical brake thermal efficiencies of HD vehicles, 43%-46%, S15 (44.5%). The equations for all the versions of EFs are assembled in Table S4.
S-18  Figure S2: Emission factors of particles larger than 23 nm (EF >23 ) of all vehicles, obtained using ordinary regression, instead of robust regression as in Fig. 3. Whereas the EF data shown in the both subplots are equal (from ordinary regression), their sorting is different: the vehicles are sorted according to their emission factors obtained (a) using ordinary regression and (b) using robust regression, i.e., the indices in (b) are equal to the ones in Fig. 3 but the EF data not. The error bars denote the standard errors of the slopes in the regressions. The EF data are shown as medians (with the standard deviations, σ) and means for the vehicle categories (see the footnote for the arrangement of the values). The units of EFs for light-duty (LD) vehicles are 10 13 /km and for heavy-duty (HD) vehicles 10 13 /kWh. The number of vehicles in each vehicle category is represented by n. The data in the upper part of the table are presented also graphically in Fig. 5. The lower part presents the data for the particle size bins.